22 research outputs found

    PSF toolkit: an R package for pathway curation and topology-aware analysis

    Get PDF
    Most high throughput genomic data analysis pipelines currently rely on over-representation or gene set enrichment analysis (ORA/GSEA) approaches for functional analysis. In contrast, topology-based pathway analysis methods, which offer a more biologically informed perspective by incorporating interaction and topology information, have remained underutilized and inaccessible due to various limiting factors. These methods heavily rely on the quality of pathway topologies and often utilize predefined topologies from databases without assessing their correctness. To address these issues and make topology-aware pathway analysis more accessible and flexible, we introduce the PSF (Pathway Signal Flow) toolkit R package. Our toolkit integrates pathway curation and topology-based analysis, providing interactive and command-line tools that facilitate pathway importation, correction, and modification from diverse sources. This enables users to perform topology-based pathway signal flow analysis in both interactive and command-line modes. To showcase the toolkit’s usability, we curated 36 KEGG signaling pathways and conducted several use-case studies, comparing our method with ORA and the topology-based signaling pathway impact analysis (SPIA) method. The results demonstrate that the algorithm can effectively identify ORA enriched pathways while providing more detailed branch-level information. Moreover, in contrast to the SPIA method, it offers the advantage of being cut-off free and less susceptible to the variability caused by selection thresholds. By combining pathway curation and topology-based analysis, the PSF toolkit enhances the quality, flexibility, and accessibility of topology-aware pathway analysis. Researchers can now easily import pathways from various sources, correct and modify them as needed, and perform detailed topology-based pathway signal flow analysis. In summary, our PSF toolkit offers an integrated solution that addresses the limitations of current topology-based pathway analysis methods. By providing interactive and command-line tools for pathway curation and topology-based analysis, we empower researchers to conduct comprehensive pathway analyses across a wide range of applications

    The Evolving Faces of the SARS-CoV-2 Genome

    Get PDF
    Surveillance of the evolving SARS-CoV-2 genome combined with epidemiological monitoring and emerging vaccination became paramount tasks to control the pandemic which is rapidly changing in time and space. Genomic surveillance must combine generation and sharing sequence data with appropriate bioinformatics monitoring and analysis methods. We applied molecular portrayal using self-organizing maps machine learning (SOM portrayal) to characterize the diversity of the virus genomes, their mutual relatedness and development since the beginning of the pandemic. The genetic landscape obtained visualizes the relevant mutations in a lineage-specific fashion and provides developmental paths in genetic state space from early lineages towards the variants of concern alpha, beta, gamma and delta. The different genes of the virus have specific footprints in the landscape reflecting their biological impact. SOM portrayal provides a novel option for ‘bioinformatics surveillance’ of the pandemic, with strong odds regarding visualization, intuitive perception and ‘personalization’ of the mutational patterns of the virus genomes

    Transcriptome-Guided Drug Repositioning

    Get PDF
    Drug repositioning can save considerable time and resources and significantly speed up the drug development process. The increasing availability of drug action and disease-associated transcriptome data makes it an attractive source for repositioning studies. Here, we have developed a transcriptome-guided approach for drug/biologics repositioning based on multi-layer self-organizing maps (ml-SOM). It allows for analyzing multiple transcriptome datasets by segmenting them into layers of drug action- and disease-associated transcriptome data. A comparison of expression changes in clusters of functionally related genes across the layers identifies “drug target” spots in disease layers and evaluates the repositioning possibility of a drug. The repositioning potential for two approved biologics drugs (infliximab and brodalumab) confirmed the drugs’ action for approved diseases (ulcerative colitis and Crohn’s disease for infliximab and psoriasis for brodalumab). We showed the potential efficacy of infliximab for the treatment of sarcoidosis, but not chronic obstructive pulmonary disease (COPD). Brodalumab failed to affect dysregulated functional gene clusters in Crohn’s disease (CD) and systemic juvenile idiopathic arthritis (SJIA), clearly indicating that it may not be effective in the treatment of these diseases. In conclusion, ml-SOM offers a novel approach for transcriptome-guided drug repositioning that could be particularly useful for biologics drugs

    Population Levels Assessment of the Distribution of Disease-Associated Variants With Emphasis on Armenians – A Machine Learning Approach

    Get PDF
    Background: During the last decades a number of genome-wide association studies (GWASs) has identified numerous single nucleotide polymorphisms (SNPs) associated with different complex diseases. However, associations reported in one population are often conflicting and did not replicate when studied in other populations. One of the reasons could be that most GWAS employ a case-control design in one or a limited number of populations, but little attention was paid to the global distribution of disease-associated alleles across different populations. Moreover, the majority of GWAS have been performed on selected European, African, and Chinese populations and the considerable number of populations remains understudied.Aim: We have investigated the global distribution of so far discovered disease-associated SNPs across worldwide populations of different ancestry and geographical regions with a special focus on the understudied population of Armenians.Data and Methods: We have used genotyping data from the Human Genome Diversity Project and of Armenian population and combined them with disease-associated SNP data taken from public repositories leading to a final dataset of 44,234 markers. Their frequency distribution across 1039 individuals from 53 populations was analyzed using self-organizing maps (SOM) machine learning. Our SOM portrayal approach reduces data dimensionality, clusters SNPs with similar frequency profiles and provides two-dimensional data images which enable visual evaluation of disease-associated SNPs landscapes among human populations.Results: We find that populations from Africa, Oceania, and America show specific patterns of minor allele frequencies of disease-associated SNPs, while populations from Europe, Middle East, Central South Asia, and Armenia mostly share similar patterns. Importantly, different sets of SNPs associated with common polygenic diseases, such as cancer, diabetes, neurodegeneration in populations from different geographic regions. Armenians are characterized by a set of SNPs that are distinct from other populations from the neighboring geographical regions.Conclusion: Genetic associations of diseases considerably vary across populations which necessitates health-related genotyping efforts especially for so far understudied populations. SOM portrayal represents novel promising methods in population genetic research with special strength in visualization-based comparison of SNP data

    High-Resolution Cartography of the Transcriptome and Methylome Landscapes of Diffuse Gliomas

    Get PDF
    Molecular mechanisms of lower-grade (II–III) diffuse gliomas (LGG) are still poorly understood, mainly because of their heterogeneity. They split into astrocytoma- (IDH-A) and oligodendroglioma-like (IDH-O) tumors both carrying mutations(s) at the isocitrate dehydrogenase (IDH) gene and into IDH wild type (IDH-wt) gliomas of glioblastoma resemblance. We generated detailed maps of the transcriptomes and DNA methylomes, revealing that cell functions divided into three major archetypic hallmarks: (i) increased proliferation in IDH-wt and, to a lesser degree, IDH-O; (ii) increased inflammation in IDH-A and IDH-wt; and (iii) the loss of synaptic transmission in all subtypes. Immunogenic properties of IDH-A are diverse, partly resembling signatures observed in grade IV mesenchymal glioblastomas or in grade I pilocytic astrocytomas. We analyzed details of coregulation between gene expression and DNA methylation and of the immunogenic micro-environment presumably driving tumor development and treatment resistance. Our transcriptome and methylome maps support personalized, case-by-case views to decipher the heterogeneity of glioma states in terms of data portraits. Thereby, molecular cartography provides a graphical coordinate system that links gene-level information with glioma subtypes, their phenotypes, and clinical context

    Long-term environmental metal exposure is associated with hypomethylation of CpG sites in NFKB1 and other genes related to oncogenesis

    No full text
    Abstract Background Long-term environmental exposure to metals leads to epigenetic changes and may increase risks to human health. The relationship between the type and level of metal exposure and epigenetic changes in subjects exposed to high concentrations of metals in the environment is not yet clear. The aim of our study is to find the possible association of environmental long-term exposure to metals with DNA methylation changes of genes related to immune response and carcinogenesis. We investigated the association of plasma levels of 21 essential and non-essential metals detected by ICP-MS and the methylation level of 654 CpG sites located on NFKB1, CDKN2A, ESR1, APOA5 , IGF2 and H19 genes assessed by targeted bisulfite sequencing in a cohort of 40 subjects living near metal mining area and 40 unexposed subjects. Linear regression was conducted to find differentially methylated positions with adjustment for gender, age, BMI class, smoking and metal concentration. Results In the metal-exposed group, five CpGs in the NFKB1 promoter region were hypomethylated compared to unexposed group. Four differentially methylated positions (DMPs) were associated with multiple metals, two of them are located on NFKB1 gene, and one each on CDKN2A gene and ESR1 gene. Two DMPs located on NFKB1 (chr4:102500951, associated with Be) and IGF2 (chr11:2134198, associated with U) are associated with specific metal levels. The methylation status of the seven CpGs located on NFKB1 (3), ESR1 (2) and CDKN2A (2) positively correlated with plasma levels of seven metals (As, Sb, Zn, Ni, U, I and Mn). Conclusions Our study revealed methylation changes in NFKB1, CDKN2A, IGF2 and ESR1 genes in individuals with long-term human exposure to metals. Further studies are needed to clarify the effect of environmental metal exposure on epigenetic mechanisms and pathways involved

    The Evolving Faces of the SARS-CoV-2 Genome

    No full text
    Surveillance of the evolving SARS-CoV-2 genome combined with epidemiological monitoring and emerging vaccination became paramount tasks to control the pandemic which is rapidly changing in time and space. Genomic surveillance must combine generation and sharing sequence data with appropriate bioinformatics monitoring and analysis methods. We applied molecular portrayal using self-organizing maps machine learning (SOM portrayal) to characterize the diversity of the virus genomes, their mutual relatedness and development since the beginning of the pandemic. The genetic landscape obtained visualizes the relevant mutations in a lineage-specific fashion and provides developmental paths in genetic state space from early lineages towards the variants of concern alpha, beta, gamma and delta. The different genes of the virus have specific footprints in the landscape reflecting their biological impact. SOM portrayal provides a novel option for ‘bioinformatics surveillance’ of the pandemic, with strong odds regarding visualization, intuitive perception and ‘personalization’ of the mutational patterns of the virus genomes

    The Evolving Faces of the SARS-CoV-2 Genome

    No full text
    Surveillance of the evolving SARS-CoV-2 genome combined with epidemiological monitoring and emerging vaccination became paramount tasks to control the pandemic which is rapidly changing in time and space. Genomic surveillance must combine generation and sharing sequence data with appropriate bioinformatics monitoring and analysis methods. We applied molecular portrayal using self-organizing maps machine learning (SOM portrayal) to characterize the diversity of the virus genomes, their mutual relatedness and development since the beginning of the pandemic. The genetic landscape obtained visualizes the relevant mutations in a lineage-specific fashion and provides developmental paths in genetic state space from early lineages towards the variants of concern alpha, beta, gamma and delta. The different genes of the virus have specific footprints in the landscape reflecting their biological impact. SOM portrayal provides a novel option for ‘bioinformatics surveillance’ of the pandemic, with strong odds regarding visualization, intuitive perception and ‘personalization’ of the mutational patterns of the virus genomes

    The Evolving Faces of the SARS-CoV-2 Genome

    No full text
    Surveillance of the evolving SARS-CoV-2 genome combined with epidemiological monitoring and emerging vaccination became paramount tasks to control the pandemic which is rapidly changing in time and space. Genomic surveillance must combine generation and sharing sequence data with appropriate bioinformatics monitoring and analysis methods. We applied molecular portrayal using self-organizing maps machine learning (SOM portrayal) to characterize the diversity of the virus genomes, their mutual relatedness and development since the beginning of the pandemic. The genetic landscape obtained visualizes the relevant mutations in a lineage-specific fashion and provides developmental paths in genetic state space from early lineages towards the variants of concern alpha, beta, gamma and delta. The different genes of the virus have specific footprints in the landscape reflecting their biological impact. SOM portrayal provides a novel option for ‘bioinformatics surveillance’ of the pandemic, with strong odds regarding visualization, intuitive perception and ‘personalization’ of the mutational patterns of the virus genomes
    corecore